The Atari Compendium

home *** CD-ROM | disk | FTP | other *** search

/ The Atari Compendium / The Atari Compendium (Toad Computers) (1994).iso / files / umich / apps / math / bstat248.lzh / STAT3.DOC < prev next >

Wrap

Text File | 1995-03-04 | 11.8 KB | 264 lines

STATS 3 MENU REGRESSION For the tests that follow, all except LOGIT regression have similar input and output structures. You will be asked for the variables that are the independent variables and for the one dependent variable. You will then be asked for the variable (column) into which the calculated values should be placed. The program does not place the residuals in variable (column) a, as this would restrict the number of variables which could actually be used in the regression. To get the residuals, simply subtract the calculated data from the actual in the data editor. The differences lie in additional parts of the regressions. -Multiple regression is a traditional regression. -Ridge regression will require the entry of a ridge factor, which should be small and between 0 and 1 (most often below .2). - -Stepwise regression is like multiple regression, except that you specify all independent variables to be considered. The program decides on which of these to actually use in the regression. - -Cochran refers to a regression done using the Cochran-Orcutt procedure. A "Cochran" factor of between 0 and 1 must be used. This type of regression actually uses a part of the previous point in the calculation. If the Cochran factor is 1, then the regression is actually calculated upon the first differences of the variables. -Huber regression is used to reduce the weight given to outliers in the data. You will need to specify two additional pieces of data. The first is the variable into which the program places the weights, and the second is the value of the residual at which the weights should start to be changed. This procedure can only be used after first doing a traditional regression. -Weighted regression requires you to specify a weight variable before execution. -Chow regression is a simple modification of multiple regression. It is used to see if the regression parameters are constant over the scope of the data variables. You will have to specify the number of points to keep in the first sample. -LOGIT regression is used when the dependent variable is to be constrained to a value above 0 but below 1. LOGIT setup converts unsummarized data to the form required by the regression program. (Save original data first!) -PROBIT regression is similar to LOGIT regression. The difference is the type of curve that is fit to the data. The logit fits a logistic curve to the data while the probit fits a normal distribution to the data. Except at the extremes (close to zero or 1) the difference between the results is very slight. PROBIT setup converts unsummarized data to the form required by the regression program. Traditionally, in the probit transform. 5 was added to the normal deviate to avoid negative numbers. I have dispensed with that addition to simplify the result. I think that in the 1990s we all are comfortable with negatives. As a result the constant from B/STAT will be 5 lower than from traditional packages. -Non Linear regression refers to a regression where the form is not linear in the parameters. In such a case the usual mathematical procedures do not work. In this case you will be asked for the dependent variables, a variable containing standard errors of the measured points and a variable to place the results in. You will not be asked for the independent variables. Instead you will be asked to enter the equation. This equation is of the form Y=f(X) except that you will use the column letters ("a" "b" etc) for the independent variables. Each parameter that you wish to estimate will have the form "PARM1" "PARM2" etc. If we wanted to estimate "a" and "b" in the following formula Y=a(1-EXP(-bX)) we would enter PARM1*(1-EXP(-1*PARM2*a)) if the X variable was in column "a" of the spreadsheet. -Principle Components is not actually a regression method at all. It is a process used to reduce the number of variables needed to explain the variation in the data. The resultant variables are orthogonal; that is the correlation between any two variables is 0. Regression can often then be carried out against these pseudo- variables. The process is destructive, in that it wipes out the existing variables. Each new one is a linear combination of the others. -Correlation matrix shows the correlation between a group of variables, rather than doing a full regression. This is often done to look at the effects of multi-collinearity on the data. TIME SERIES These are methods of smoothing or projecting data. They are often used in combination with other procedures. -Moving average requires you to choose the variable and the period of the moving average. As well, you must select a variable into which the averaged variable will be placed. -Geometric moving average requires the same input as linear moving average. -Fourier smoothing requires a variable to smooth and a variable to place the result. It also asks for the number of terms to be kept in the intermediate calculations. This value should be less than 50, usually less than 15. There must be no missing data for this procedure to work. Note that this can be a slow process. -Linear smoothing requires a variable to smoth and a variable to place the result. A linear regression is made assuming that the independent variable is a simple counter from 1 to the number of rows used. The equation is Y=a+b.t -Polynomial smoothing fits a power series to the data. In addition to the variable to smooth and the result variable you must input the degree of the polynomial. A power of 1 is a linear regression. A power of 2 fits the curve Y=a+b.t+c.t.t A power of 3 fits Y=a+b.t+c.t.t+d.t.t.t etc -Exponential Form fits an equation such that Y=EXP(a+b.t) This is called exponential form to distinguish from exponential smoothing which is a totally different process. -S-Shape smoothing fits the following curve Y=EXP(a+b/t) Such a curve will rise and then approach EXP(a) if "b" is negative. If "b" is positive then the curve will drop to approach EXP(a) -Brown 1-way exponential smoothing is simple exponential smoothing. You will be asked to specify the variable to smooth, and a variable in which to store the result. In addition, you will need a smoothing constant (0 to 1) and a starting value. If you do not specify the starting value, the program will generate one. This process is not designed for data with a distinct trend line. If there is a distinct linear trend, then 2-way exponential smoothing should be used. -Brown's 2-way exponential smoothing uses linear regression to estimate a starting value and trend. You must estimate the smoothing coefficient and variable to smooth, and variable for result. -Holt's 2-way exponential smoothing is similar to Brown's, except that a separate smoothing coefficient is used for the trend factor. Also you may enter initial values for the level and trend. -Multiplicative exponential smoothing is almost identical to Holt's. The difference is that the trend factor is taken as a proportionat e increase in value rather than a constant to add. Thus .02 does not mean that the trend is initially an increment of .02 but rather a percentage increase of 2%. -Winter's exponential smoothing is used if there is a seasonal aspect to the data (like retail sales which have a December peak). You will have to enter 4 quantities. The first is the smoothing coefficient for level. The second is for trend. The third is for seasonality. The fourth value is the period of seasonality. Note that this method should not be used with data fluctuating above and below zero. With data that go below zero, add a constant to the data to eliminate negative values. Then, after smoothing, subtract the constant. Interpolation B/STAT uses 4 forms of estimating unavailable data. -Simple linear interpolation requires that you simply select the variable. -Geometric interpolation. Basically the same as linear interpolation except that the assumption is that the points are connected by a multiplicitive relationship rather than additive. -Lagrangian interpolation requires two variables: an "X" variable and a "Y" variable. There can be no missing "X" variables. This can be slow with a large data set, since each point is used in estimating missing data. -Cubic splines assumes that the data set in the selected variable consists of evenly-spaced observations. EXTRACT These selections allow you to reduce the size of the data set. The first option sums the data. For example, if you want to get yearly totals from a data set of monthly data, you can extract summed data and reduce the data by a factor of 12. Each element would then be a yearly total. In the non-summed case, only every 12th value would be left. No summing would be done. This is useful if you want to look at subsets in isolation. MISCELLANEOUS This menu has three procedures, in addition to the usual help selection. -Crosstabs is used to summarize data which contained in two or three variables. It produces a count for the combination of values in the chosen variables. For example, you may have data on the height and weight of a group of army recruits. You could use crosstabs to find out the number in each height and weight classification, where these could be height in 2-inch increments and weight in 5-pound increments. It is most commonly used in market research for crosses, such as between age 30 and 34 and earning between 20,000 and 30,000 dollars per year. You first select the variables to use in the crosstab. If you select two, then a 2-way crosstab is done. If three, then a 3-way crosstab is done. Next, you select the break points for the classes in each variable. There may be up to 14 breakpoints, giving a maximum of 15 classes for each variable. You need only type in as many breakpoints as there are in the a specific variable, and leave the rest blank. The number of break points can be different for each variable. Note that the lower class includes the break point value. Thus, a breakpoint of 200 pounds would put 200-pound people in the lower class and 200.01 pound people in the higher class. The program will print out the results. If you want, you may replace the data in memory with the summarized totals. This can be quite useful if you then want to perform a Chi square test, type 2, on the result to see if there are any significant relationships. One factor crosstabs are available. If you choose only one variable then the program will generate a new data matrix composed of 2 variables only. There will be one entry for each unique value in the chosen variable. The second variable will be the number of occurrences of that value in the original variable. This is a destructive process which erases all original data. -Difference is a rather simple process. The difference of a variable is simply the amount of its change from one period to the next. Sometimes some procedures will work better on the change in a variable rather than the variable itself. This is especially true in Box Jenkins analysis. You merely supply the variable to difference and the variable into which to place the result. -Box Cox Transforms are used to transform a variable so that the values are normally distributed. The Box Cox procedure uses a variable called "lambda". You must provide the minimum lambda to test as well as the maximum. You also must specify the number of steps to use in going from the minimum to the maximum. The program will select the best value of lambda from the ones that it tests. The variable to test must have all values greater than zero. You also specify a variable into which teh result will be placed.